eMule uses various means to make sure that files in the network are shared and downloaded without errors. Should an error occur, called corruption, eMule has advanced functions to correct this corruption with a minimum of additional data to redownload.
File Hash and ICH - Intelligent Corruption Handling
File Hash, Part Hashes & Hashset
For every file shared in the network an unique identification value is created
using the mathematical crypto algorithm MD4. This value is called file hash
and is contained in every standard eD2k link, e.g.
ed2k://|file|name|12043984|6744FC42EDA527B27F0B2F2538728B3E|/
where 6744FC42EDA527B27F0B2F2538728B3E is the file hash making this file uniquely
identified all over the network.
This File Hash is calculated by dividing the entire file into parts
of 9.28 MB. For each of the parts a Part Hash is calculated using the same MD4
algorithm. These Part Hashes, called Hashset, are then used
to calculate the final File Hash. For example a 600 MB file would be divided
into 65 parts each with its own Part Hash which are then used to create
the final File Hash.
To make sure that eMule always receives the correct Hashset a special link can
be created containing this, e.g.
ed2k://|file|name|12043984|6744FC42EDA527B27F0B2F2538728B3E|p=264E6F6B587985D87EB0157A2A7BAF40:17B9A4D1DCE0E4C2B672DF257145E98A|/
where the p= value denotes the Hashset. Each Part Hash is divided by a ":". This file has a size of 12043984 Bytes (=11.49 MB) which means it has one full 9.28 part and the rest left to 11.49 MB resulting in two Part Hashes.
ICH Intelligent Corruption Handling
Whenever eMule finishes such a part it will be checked if the downloaded data
matches the Part Hash value for this finished part. If yes, this part is offered
for upload to help spreading it.
If no, a corruption has occurred and the part has to be downloaded again. To
avoid downloading the full 9.28 MB, ICH redownloads 180 KB from the beginning
of the part and then check the whole part again to see if the Part Hash is now
correct. If not, the next 180 KB are downloaded, check again etc. until the
part hash is correct again. In the best case eMule has to download only 180
KB again if the corruption is right at the beginning of the part. In the worst
case the entire part has to be downloaded again if the corruption is somewhere
near the end of the part. On average ICH saves 50% of redownloading in case
of part corruptions.
AICH - Advanced Intelligent Corruption Handling
The standard ICH is pretty effective although it has its limitations as only
the whole 9.28 MB can be verified and no finer blocks. If more than one position
is corrupted or if malicious clients spread corrupted data over and over again
or even fake an entire Part Hash, ICH is no long effective.
Here AICH will care for complete data integrity with a minimum cost of redownloading
or overhead by creating much finer hashes.
Root Hash, Block Hashes & AICH Hashset
This time our starting point are the 9.28 MB parts in a file. Each part is divided
into 180 KB blocks, resulting in 53 blocks per part and for each block a hash
value is calculated using the SHA1 hash algorithm. These values are called Block
Hashes and form the lowest level of a complete AICH Hashset.
The image above shows how such a full hash tree is build upon the blocks of
a full 4 parts file. Each part contains 53 blocks resulting in 212 Block
Hashes which builds up a hash tree of further 7 Levels until the Root
Hash is reached. The entire tree is called AICH Hashset.
The green and yellow dots show the mathematical dependencies of the smallest
Block Hash to the Root Hash. This means if we have a trusted
Root Hash the entire tree can be verified against it.
eMule can create links containing the Root Hash, e.g.
ed2k://|file|name|12043984|6744FC42EDA527B27F0B2F2538728B3E|h=A2NWOTYURUU3P3GCUB6KCNW3FTYYELQB|/
where h= is the Root Hash. For releases this value should be included as it significantly improves the corruption resistance of the file by providing a trusted Root Hash. Read Trusting the Root Hash
Recovering from a corruption
Whenever eMule detects a corruption in a part it requests a Recovery Packet
from a random client with a complete AICH Hash Set. This Recovery Packet contains
all 53 Block Hashes of the corrupted part and a number of Verifying
Hashes of the entire hash tree. The image above shows a Recovery Packet
for a 4 parts file. The number of Verifying Hashes is determined by
the part count of the file (2^x >= 'part count', with x = Number of Verifying
Hashes).
After receiving the Recovery Packet eMule checks the Verifying Hashes
against its trusted Root Hash. If they match, eMule checks all 53 blocks of
the corrupted part against the Block Hashes from the Recovery Packet.
AICH then restores all blocks which match against their Block Hash
to leave only the corrupted block(s) for redownload.
In the Log a successful recovery will look like:
09.09.2004 02:43:43: Downloaded part 6 is corrupt ([file])
09.09.2004 02:43:46: AICH successfully recovered 8.22 MB of 9.28 MB from part
6 for [file]
Trusting the Root Hash
The best thing is downloading from a link with a Root Hash. Assuming
that the source of this link is trustworthy the Root Hash is trusted at once
and saved to disk for this file.
If no Root Hash is provided in the link eMule has to trust the Root Hash, which
the sources for the file send. It only trusts a Root Hash if at least
10 different sources send the same value and if at least 92% of all sources
agree to this value. Because this Root Hash is not as trustworthy it
is only valid for the current session and does not get saved nor can links with
Root Hash be created.
Once eMule has build an entire AICH Hashset, i.e. the file is finished,
it starts propagating the Root Hash to other clients.
Notes: | |
• | New releases or rare files will probably not have enough full sources to generate a trusted Root Hash. It is recommended to Release files with attached hash. |
• | If there is no Root Hash or even a faked one eMule will be able to successfully download and finish the file under normal conditions. The AICH feature cannot be used in this case. |
• | As AICH Hashsets can be very large they are not stored in memory but in the known2.met and are only read upon demand. |
• | AICH will only be effective for eMule clients v.44a and above but retains backward compatibility to older clients. |
Applies to: v.44a+
Update on: 2004-09-11 by Monk